Goto

Collaborating Authors

 Winchester


Closing the Modality Gap for Mixed Modality Search

Li, Binxu, Zhang, Yuhui, Wang, Xiaohan, Liang, Weixin, Schmidt, Ludwig, Yeung-Levy, Serena

arXiv.org Artificial Intelligence

Mixed modality search -- retrieving information across a heterogeneous corpus composed of images, texts, and multimodal documents -- is an important yet underexplored real-world application. In this work, we investigate how contrastive vision-language models, such as CLIP, perform on the mixed modality search task. Our analysis reveals a critical limitation: these models exhibit a pronounced modality gap in the embedding space, where image and text embeddings form distinct clusters, leading to intra-modal ranking bias and inter-modal fusion failure. To address this issue, we propose GR-CLIP, a lightweight post-hoc calibration method that removes the modality gap in CLIP's embedding space. Evaluated on MixBench -- the first benchmark specifically designed for mixed modality search -- GR-CLIP improves NDCG@10 by up to 26 percentage points over CLIP, surpasses recent vision-language generative embedding models by 4 percentage points, while using 75x less compute.


Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition

Razeghi, Behrooz, Rahimi, Parsa, Marcel, Sébastien

arXiv.org Artificial Intelligence

In this study, we apply the information-theoretic Privacy Funnel (PF) model to the domain of face recognition, developing a novel method for privacy-preserving representation learning within an end-to-end training framework. Our approach addresses the trade-off between obfuscation and utility in data protection, quantified through logarithmic loss, also known as self-information loss. This research provides a foundational exploration into the integration of information-theoretic privacy principles with representation learning, focusing specifically on the face recognition systems. We particularly highlight the adaptability of our framework with recent advancements in face recognition networks, such as AdaFace and ArcFace. In addition, we introduce the Generative Privacy Funnel ($\mathsf{GenPF}$) model, a paradigm that extends beyond the traditional scope of the PF model, referred to as the Discriminative Privacy Funnel ($\mathsf{DisPF}$). This $\mathsf{GenPF}$ model brings new perspectives on data generation methods with estimation-theoretic and information-theoretic privacy guarantees. Complementing these developments, we also present the deep variational PF (DVPF) model. This model proposes a tractable variational bound for measuring information leakage, enhancing the understanding of privacy preservation challenges in deep representation learning. The DVPF model, associated with both $\mathsf{DisPF}$ and $\mathsf{GenPF}$ models, sheds light on connections with various generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion models. Complementing our theoretical contributions, we release a reproducible PyTorch package, facilitating further exploration and application of these privacy-preserving methodologies in face recognition systems.


Artificial Intelligence, Robotics, Ethics, and the Military: A Canadian Perspective

Wasilow, Sherry (Defence Research and Development Canada) | Thorpe, Joelle B. (Defence Research and Development Canada)

AI Magazine

Defense and security organizations depend upon science and technology to meet operational needs, predict and counter threats, and meet increasingly complex demands of modern warfare. Artificial intelligence and robotics could provide solutions to a wide range of military gaps and deficiencies. At the same time, the unique and rapidly evolving nature of AI and robotics challenges existing polices, regulations, and values, and introduces complex ethical issues that might impede their development, evaluation, and use by the Canadian Armed Forces (CAF). Early consideration of potential ethical issues raised by military use of emerging AI and robotics technologies in development is critical to their effective implementation. This article presents an ethics assessment framework for emerging AI and robotics technologies. It is designed to help technology developers, policymakers, decision makers, and other stakeholders identify and broadly consider potential ethical issues that might arise with the military use and integration of emerging AI and robotics technologies of interest. We also provide a contextual environment for our framework, as well as an example of how our framework can be applied to a specific technology. Finally, we briefly identify and address several pervasive issues that arose during our research.


Social worker takes her first steps in 14 YEARS – thanks to a £80,000 robotic exoskeleton

Daily Mail - Science & tech

This is the incredible moment a social worker takes her first steps in 14 years - with the help of an £80,000 robotic suit. Lucy Dodd, from Aldershot, was suddenly struck by a rare congenital malformation of blood vessels and left paralysed from the waist down as a teenager. But now the social worker, who has been wheelchair-bound since she was 19, can move her legs again through a revolutionary exoskeleton. In moving footage of her using the bionic contraption that allows paraplegics to stand and move, Ms Dodd can be seen striding out. The 34-year-old is now desperately trying to raise the five-figure sum for the ReWalk exoskeleton in order to get her mobility back.


What Amazon Alexa pays the people building its skills

#artificialintelligence

The Alexa skills economy is still in its infancy. On a lark, Joel Wilson started developing skills for Alexa, Amazon's voice assistant, this past January. After a few weeks of coding, he launched two skills -- Amazon's term for voice-controlled apps -- called Question of the Day and Three Questions. I was just doing it for fun," said Wilson, 47, CEO of a small marketing analytics company in Washington, DC. Joel Wilson has created skills such as Three Questions and Question of the Day. In May, he got an email from Amazon telling him to expect a check in the mail as part of a new program that pays cash to makers of popular skills. That first month, Amazon sent him $2,000. It got better from there. He's received checks for $9,000 over each of the past three months, he said. Wilson unexpectedly joined a new Alexa economy, a small but fast-growing network of independent developers, marketing companies and Alexa tools makers. They're working to bring you voice-activated flash briefings, games and recipes through Amazon's Echo speaker, Alexa's primary home. By doing so, they hope to define the 3-year-old Alexa platform and make money from voice computing's surging popularity. Two years ago, there wasn't nearly as much to do on Alexa and the market for making Alexa skills was worth a mere $500,000. Now, with more than 25,000 skills available, the market is expected to hit $50 million in 2018, according to analytics firm VoiceLabs. That's dwarfed by the mobile app economy, with global sales of over $50 billion, but Alexa is growing at a far faster rate. Customers rarely pay these developers and marketers directly, but they have a big stake in these workers' efforts. Their success, or failure, will determine the number and the quality of skills, such as more complex games, better smart-home controls and more services from companies like Lyft or Domino's Pizza. Alexa is an increasingly important business for Amazon, which is expanding the assistant into millions of internet-connected gadgets and moving it into the workplace. Drawing in more developers will help the company sell more Alexa-powered devices and strengthen its top-dog status in voice. It's sold more than 20 million Echo speakers in the US, taking up 70 percent of the market and helping Alexa become the most active voice market for developers today. "Every skill makes Alexa smarter or more useful," Rob Pulciani, director of Amazon Alexa, said in a statement to CNET. "We can't do that by ourselves and we want to enable indie developers to innovate and extend Alexa capabilities at a rapid pace.


Claude Shannon: Reluctant Father of the Digital Age

AITopics Original Links

Pick up a favorite CD. Then slide it into the slot on the player-and listen as the music comes out just as crystal clear as the day you first opened the plastic case. Before moving on with the rest of your day, give a moment of thought to the man whose revolutionary ideas made this miracle possible: Claude Elwood Shannon. Shannon, who died in February after a long illness, was one of the greatest of the giants who created the information age. John von Neumann, Alan Turing and many other visionaries gave us computers that could process information. But it was Claude Shannon who gave us the modern concept of information-an intellectual leap that earns him a place on whatever high-tech equivalent of Mount Rushmore is one day established. The entire science of information theory grew out of one electrifying paper that Shannon published in 1948, when he was a 32-year-old researcher at Bell Laboratories.


Claude Shannon, the Father of the Information Age, Turns 1100100

The New Yorker

Twelve years ago, Robert McEliece, a mathematician and engineer at Caltech, won the Claude E. Shannon Award, the highest honor in the field of information theory. During his acceptance lecture, at an international symposium in Chicago, he discussed the prize's namesake, who died in 2001. Claude Shannon: Born on the planet Earth (Sol III) in the year 1916 A.D. Generally regarded as the father of the information age, he formulated the notion of channel capacity in 1948 A.D. Within several decades, mathematicians and engineers had devised practical ways to communicate reliably at data rates within one per cent of the Shannon limit. As is sometimes the case with encyclopedias, the crisply worded entry didn't quite do justice to its subject's legacy. That humdrum phrase--"channel capacity"--refers to the maximum rate at which data can travel through a given medium without losing integrity.


Claude Shannon: Tinkerer, Prankster, and Father of Information Theory

#artificialintelligence

Editor's note: This month marks the centennial of the birth of Claude Shannon, the American mathematician and electrical engineer whose groundbreaking work laid out the theoretical foundation for modern digital communications. To celebrate the occasion, we're republishing online a memorable profile of Shannon that IEEE Spectrum ran in its April 1992 issue. Written by former Spectrum editor John Horgan, who interviewed Shannon at his home in Winchester, Mass., the profile reveals the many facets of Shannon's character: While best known as the father of information theory, Shannon was also an inventor, tinkerer, puzzle solver, and prankster. The 1992 profile included a portrait of Shannon taken by Boston-area photographer Stanley Rowin. On this page we're reproducing that portrait along with other Shannon photos by Rowin that Spectrum has never published. Shannon died in 2001 at age 84 after a long battle with Alzheimer's disease. He is regarded as one of the greatest electrical engineering heroes of all time.